# Aurora: A Cross-Layer Solution for Thermally Resilient Photonic Network-on-Chip

Zhongqi Li, Amer Qouneh, Madhura Joshi, Wangyuan Zhang, Xin Fu, and Tao Li

Abstract—With silicon optical technology moving toward maturity, the use of photonic networks-on-chip (NoCs) for global chip communication is emerging as a promising solution to the communication requirements of future many core processors. It is expected that photonic NoCs will play an important role in alleviating current power, latency, and bandwidth constraints. However, photonic NoCs are sensitive to ambient temperature variations because their basic constituents, ring resonators, are themselves sensitive to those variations. Since ring resonators are basic building blocks for photonic modulators, switches, multiplexers, and demultiplexers, variations of on-chip temperature pose serious challenges to the proper operation of photonic NoCs. Proposed methods that mitigate the effects of temperature at the device level are either difficult to use in CMOS processes or not suitable for large scale implementation. In this paper, we propose Aurora, a thermally resilient photonic NoC architecture design that supports reliable and low bit error rate (BER) on-chip communications in the presence of large temperature variations. Our proposed architecture leverages cross-layer solutions at the device, architecture, and operating system (OS) layers that individually provide considerable improvements and synergistically provide even more significant improvements. To compensate for small temperature variations, our design varies the bias current through ring resonators. For larger temperature variations, we propose architecture-level techniques to reroute messages away from hot regions, and through cooler regions, to their destinations. We also propose a thermal/congestion-aware coscheduling algorithm at the OS level to further lower BER by reorganizing the thermal profile of the chip. Our simulation results show that Aurora provides a robust architectural solution to handle temperature variation effects on future photonic NoCs. For instance, average BER and message error rate are reduced by 96% and 85%, respectively, when the combined thermal optimization scheme [shortest path first + OS] is applied. From the perspective of power efficiency, Aurora is also superior to conventional photonic NoC architectures by as much as 37%.

Manuscript received October 21, 2012; revised June 11, 2013; accepted December 23, 2013. Date of publication February 25, 2014; date of current version January 16, 2015. This work was supported in part by the NSF under Grant 1117261, Grant 0937869, Grant 0916384, Grant 0845721, Grant 0834288, 0811611, Grant 0720476, and in part by the Microsoft Research Trustworthy Computing, Safe and Scalable Multicore Computing Awards. Z. Li and A. Qouneh contributed equally to this work.

- Z. Li is with Qualcomm Inc., San Diego, CA 92121 USA (e-mail: zhongqili@ufl.edu).
- A. Qouneh and T. Li are with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 33206 USA (e-mail: aqouneh@ufl.edu; taoli@ece.ufl.edu).
- M. Joshi is with Infinera, Sunnyvale, CA 94035 USA (e-mail: madhura1301@gmail.com).
- W. Zhang is with Google, Mountain View, CA 94043 USA (e-mail: wangyuan.zhang@netapp.com).
- X. Fu is with the Department of Electrical and Computer Engineering, University of Kansas, Lawrence, KS 66045 USA (e-mail: xinfu@eecs.ku.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2014.2300477

*Index Terms*—Bit error rate (BER), photonic network-on-chip (NoC), thermally resilient.

## I. INTRODUCTION

THE COMMUNICATION fabric emerges as the critical performance factor when tens or hundreds of cores are integrated into a single chip. Therefore, a high-performance network is essential for efficient intercore communication. By sharing channels and paths, packets can be routed to their destinations with optimum bandwidth, latency, and power. However, electrical network-on-chips (NoCs) do not scale well because of large latencies associated with conventional RC wires and stringent power requirements [1], [2]. Recently, photonic NoCs have been attracting plenty of attention [3]–[12]. Compared with electrical NoCs, photonic NoCs offer higher bandwidth density, lower latency [13], and power consumption that is independent of path length. These characteristics seem to be an answer to the shortcomings of electrical NoCs.

In addition, wavelength division multiplexing (WDM) and time division multiplexing allow several channels to share an optical waveguide for transmitting information, thus increasing bandwidth density. An optical waveguide is a structure constructed from two materials having different refractive indices. This allows the waveguide to confine and guide light waves via total internal reflection. Unlike electrical wires, energy is only expended at the end points and when intermediate resonators are oscillating, which reduces power consumption significantly. Since optical signals travel at a speed close to that of light, latencies are also improved. Recent advances in integrating photonic devices with microelectronics using current CMOS technology have made possible the realization of high-speed, low-power modulators, switches, and detectors that are essential to the design of photonic NoCs [14], [15]. The basic building block for these devices is a ring resonator. Ring resonators are waveguides shaped as rings. Resonance occurs when a ring selectively couples one wavelength from a close-by waveguide and ignores the rest. The significance of this ability is that ring resonators can act as filters, switches, modulators, and detectors. However, this ability can be compromised because of the effect of temperature variations on refractive index [16], [17], causing the resonance frequency to shift.

Because a variation in temperature causes a change in the refractive index, it can potentially disrupt the proper operation of photonic devices. For instance, ring resonators can be brought in or out of resonance due to a small variation in temperature. A resonance shift of 0.11 nm/K has been reported in

ring resonators [17]. In addition, Manipatruni et al. [18] have reported a high BER when a thermal shift as small as several degrees K caused a significant shift from the base resonant wavelength. Thus, small temperature variations can introduce large BER, or even cause faulty operation in photonic NoCs. Conventionally, metal strip heaters embedded around ring resonators [19] or overlaid on top of the silicon oxide cladding [20] are used to control the temperature of the resonators. However, these heaters require substantial electrical tuning power, exacerbate on-chip thermal effects, and are not suitable for use in large-scale photonic NoCs due to their bulkiness and extensive wiring requirement. Other methods resort to overlaying a polymer coating with negative temperature coefficient [21], [22]. Unfortunately, polymer is not compatible with CMOS processes yet. The International Technology Roadmap for Semiconductors (ITRS) [23] projects that 3-D chip stacking for 3-D integration is a viable solution for latency and power dissipation limitations. Hybrid photonic/electrical NoCs [9]–[12] have been proposed to be built on a separate layer on top of the core layer with through-silicon-vias connecting the two layers. Although latency and power dissipation are improved, thermal effects are compounded because of heat generated by other layers. Since heat is not easily removed from multilayered integration, techniques to counter thermal effects on photonic NoCs at the architectural and operating system (OS) levels become imperative.

To mitigate temperature effects on photonic NoCs, we propose Aurora, a thermally resilient photonic NoC architecture design that can tolerate a wide range of temperature variations. Our proposed cross-layer solution targets device, architecture, and OS layers where each can significantly improve the reliability of the photonic NoC. More attractively, combining our proposed techniques provides significant reliability improvements and, as a side benefit, better power efficiency. Our first proposed technique deals with temperature variations within a small range. To achieve this at the device level, we adopt the method proposed in [18], which varies the bias current through a ring resonator to compensate for small local temperature variations. At the architectural level and for thermal variations across a large range, we propose to reroute the messages through cooler regions of the chip to their destinations. At the OS level, we use thermal/congestion-aware coscheduling to reorganize the thermal profile of the chip to further lower BER. Our simulation results show that average BER and message error rate (MER) are reduced by 96% and 84%, respectively, when the cross-layer thermal optimization scheme [shortest path first (SPF)+OS] is applied. To the best of our knowledge, this paper presents the first effort on improving thermal reliability of photonic NoCs at the architecture and OS levels.

The rest of this paper is organized as follows. In Section II, we review NoCs and photonic devices, and describe their operation. We show how temperature variations affect the refractive index and the operation of ring resonators. In Section III, we characterize the impact of on-chip temperature variations on the BER of photonic NoCs. In Section IV, we propose our Aurora architecture design. Experimental setups are presented in Section V. Section VI evaluates the effectiveness of the

proposed techniques. We present related work in Section VII and our concluding remarks in Section VIII.

#### II. BACKGROUND

In this section, we briefly describe photonic NoCs and the thermal effects on them.

# A. Overview of Photonic NoCs and Photonic Devices

A photonic NoC system is composed of laser sources, optical modulators, drivers, waveguides, photodetectors, and amplifiers. Laser sources are assumed to be offchip because of their bulky size and power consumption. A plausible approach for integrating a photonic network into a chip is to stack a separate photonic layer that encompasses all the optical components on top of the other layers using 3-D die stacking technologies [12]. To transmit messages, a continuouswave laser beam is modulated using an external ON-OFF keying modulator. An external modulator is built by placing a ring (details in Section II-B) close to a waveguide. This arrangement has a resonance frequency that depends on the circumference of the ring and the material it is made from. The modulator is brought in and out of resonance by manipulating the refractive index of the ring, thereby turning the light beam on and off to represent digital bits. Waveguides are built by surrounding a core of high-refractive index silicon with a cladding of lower refractive index silicon dioxide. These materials are commonly available in CMOS processes. Due to good light confinement and low-transmission loss in silicon waveguides, bends and turns are possible. WDM [24] can be used to achieve high-bandwidth density by allowing different wavelengths to be transmitted simultaneously within the same waveguide with little interference. Since the frequency spectrum of a ring resonator looks like a comb filter, it is capable of switching several signals that are multiple of a fundamental frequency. In addition, path multiplicity, whereby additional parallel paths are added, can be used to increase bandwidth density as well. Photodetectors are built using Si-Ge-doped ring resonators. The doped section of the ring generates a photocurrent when the resonant wavelength circulating inside the ring is absorbed. The ability to extract or insert a particular light wavelength into a waveguide bus is crucial in photonic NoCs. By stacking an array of dissimilarly sized rings next to a waveguide bus, it is possible to add (multiplex) or drop (demultiplex) signals of different wavelengths [16].

# B. Thermal Effects on Photonic Devices and NoCs

Extensive research strived to create optical devices that can modulate, guide, and detect light signals efficiently while leveraging current CMOS processes. Of those devices, ring resonators are finding wide acceptance in the photonic and architecture communities for serving as a basic building block for various photonic circuits ranging from modulators to switches and multiplexers. Their compact size, low-power consumption, low-insertion loss, and high-extinction ratio per unit length, make them ideal for use in on-chip optical networks [18]. In this section, we give an overview of the structure and



Fig. 1. Simplified layout of a ring modulator.

operation of ring resonators, the role of refractive index, and the effects of temperature variations on their operation.

A ring resonator is built by placing a ring next to a straight waveguide, as shown in Fig. 1. The ring's circumference is designed to be a multiple integer of the wavelength traveling through the straight waveguide. The index of refraction of the materials that form the ring waveguide plays an important role in determining the resonance frequency. Resonance occurs when coupled light circulates inside the ring and is reinforced by interference while light traveling in the waveguide is suppressed. Changing the refractive index changes the resonance frequency. To control the refractive index, the method of freecarrier injection [25] is used due to its high speed. In this method, two highly doped regions that form a p-i-n junction surrounding the ring are built to form a modulator [26], as shown in Fig. 1. By applying a voltage Vm to the P and Nregions, free-charge carriers are injected into the ring, causing its effective refractive index to change. By injecting more free carriers into the ring, the refractive index decreases. On the other hand, extracting free-charge carriers increases the refractive index. Thus, p-i-n carrier injection and extraction effectively modulates the refractive index of the ring resonator. A ring resonator can be in one of two states. In the ON state, there are no free carriers in the ring since the p-i-n junction is reverse biased. By design, the resonance wavelength of the ring is same as the wavelength of the light, hence resonance is ON, and the light is coupled into the ring. This coupling causes the optical signal to circulate inside the ring, and prevents the signal from passing through the waveguide. In the OFF state, the p-i-n junction is forward biased, and thus free carriers are injected into the ring. The injection of free carriers changes the refractive index and in turn shifts the resonance wavelength. Since the resonant wavelength is now different from the wavelength of the light signal, resonance is OFF and the light continues its path unobstructed through the straight waveguide.

As described above, resonance occurs only at some specific frequencies where the light is coupled into the ring. The wavelengths at which resonance occurs [16] are governed by

$$\lambda_0 = n_{\text{eff}} L/m \tag{1}$$

where  $n_{\rm eff}$  is the effective refractive index, L is equal to  $2\pi r$ , where r is the radius of the ring, m is an integer number, and  $\lambda_0$  is the resonant wavelength. A shift of the effective refractive index results in a shift of the resonant wavelength [17]

$$\Delta \lambda = \lambda_0 (\Delta n_{\text{eff}} / n_{\text{eff}}) \tag{2}$$



Fig. 2. (a) Transmission spectra of a modulator under two different dc bias voltages. (b) Transmission spectra shifts due to changes in temperature.

where  $\Delta \lambda$  is a change in the resonant wavelength,  $\lambda_0$  is the resonant wavelength,  $\Delta n_{\rm eff}$  is a change in the effective refractive index, and  $n_{\rm eff}$  is the effective refractive index.

Fig. 2(a) shows the transmission spectra of a modulator at a nominal operating temperature  $\Delta T=0$  K. The figure shows that for a small positive increase in bias current, the spectrum shifts to the left due to a decrease in refractive index of the silicon caused by the injection of free carriers in the ring. Consequently, resonance occurs at a shorter wavelength than the original one. This shift means that a light wave at the original wavelength of  $\lambda_0$  will be allowed to pass since it has a high-transmission value. Before the shift, its transmission value at  $\lambda_0$  was small, and the wave was suppressed. Let  $n_{\text{new}} = n_{\text{eff}} - \Delta n$ , where  $\Delta n$  is a change in the refractive index, and  $\lambda_{\text{new}}$  is the new resonant wavelength. Substituting  $n_{\text{new}}$  in (1) gives  $\lambda_{\text{new}} = (n_{\text{eff}} - \Delta n)L/m = n_{\text{eff}}L/m - \Delta nL/m = \lambda_0 - \Delta \lambda$ , where  $\Delta \lambda$  is a change in the resonant wavelength.

In addition to the carrier injection method described above, the refractive index can also be altered by temperature variations. Due to silicon's relatively large thermooptic effect [27], ring resonators are sensitive to temperature variations. The thermooptic coefficient is given by  $\Delta n/\Delta T = 1.86 \times 10^{-4} / \text{K}$ . As in the carrier injection case, temperature variations also affect the refractive index and result in shifting the resonance wavelength. A resonance shift of 0.11 nm/K from the original resonance wavelength has been reported in [17]. Such resonance shifts are undesirable and increase the BER in systems that use resonant electrooptic modulators and switches. Fig. 2(b) shows transmission spectra shifts due to 2 and 4 K temperature shifts. The original spectrum is at a nominal operating temperature  $\Delta T = 0$  K and constant bias current. It is interesting to point out that temperature variation and free-carrier injection have opposite effects on the resonance frequency. For example, an increase in temperature causes an increase in the refractive index, and a corresponding shift of the spectrum toward the right. Thus, it is possible that electrooptic and thermooptic effects can compensate each

Undesirable thermal shifts will cause large BER and even faulty operation of a photonic NoC. With a rise in temperature, rings will not resonate at the intended frequency. Modulators, switches, multiplexers, and demultiplexers will produce erroneous outputs if thermal shifts are not addressed. Fig. 3 shows



Fig. 3. Representative schematics of ring-resonator building blocks. (a) Switch: resonator fails to divert a light signal. (b) Multiplexer: resonator fails to add a light signal to a waveguide bus. (c) Demultiplexer: resonator succeeds in removing a light signal from a waveguide bus. (d) Modulator: modulator encodes erroneous data on a light stream (green).

several scenarios showing the intended and the actual outputs when a ring fails to resonate at the intended frequency due to a rise in temperature.

# III. CHARACTERIZATION OF THERMAL IMPACT ON PHOTONIC NOCS

While proposed architectures [9]–[12] employ a photonic network layer placed on top of a silicon chip, we use the design proposed in [12] as a representative 3-D chip to characterize the impact of thermal variations on the reliability of photonic NoCs. In this section, we first describe the photonic network architecture; we then discuss BER as an indicator of thermal effects, quantify the BER due to temperature variations, and address temperature sensing issues.

#### A. Photonic Network Architecture

Our simulated architecture is based on 3-D integration, where a photonic NoC is implemented as a layer of optical devices on top of a silicon chip. Such an arrangement reduces fabrication complexity, chip dimensions, and total cost. A 2-D folded torus hybrid NoC topology is used in this paper since it is compatible with tiled chip multiprocessors (CMP), allows the use of low-radix switches, and allows light waves to intersect without significant cross talk. The hybrid NoC architecture [12] combines a photonic circuit-switched network with an electrical packet-switched control network to reduce power consumption while achieving high bandwidth and low latency. A photonic interconnection network, comprised of photonic interconnects, is used to transmit large messages. An electronic control network controls the operations of the photonic network and executes the exchange of short messages.

In this paper, we simulate a 2-D 30-core processor tiled in  $5 \times 6$  arrangement. The detailed processor, memory, and NoC configuration can be found in Section V. To transmit a message, a path setup packet is first sent on the electrical control network. As the packet is routed through the network, it reserves the corresponding photonic switches along its path.

Once the optical path is established, the message is transmitted through the photonic network. Only electronic packets are buffered during the path-setup phase. The routing path is selected at the source node and then carried by the electronic packets. Once the path has been acquired, photonic messages are transmitted directly to the destination without buffering or intermediate routing path selection.

In our simulated architecture, we assume that 64 wavelengths are used for modulation, resulting in 64 modulators and 64 photodetectors for a total of 128 ring resonators per core. In order to increase bandwidth density, path multiplicity can be used, where additional parallel waveguides are added to the network. These new paths will need additional modulators, multiplexers, demultiplexers, photodetectors, and switches. This will dramatically increase the number of ring resonators used in the photonic NoC.

#### B. Impact of Temperature on Ring Resonators

In 3-D packaging, the photonic network is usually implemented on top of the core layer. It experiences larger nonuniform temperature variations, depending on the temperature of the cores below. Since the photonic layer consists of thousands of ring resonators, the operation of the photonic network will be drastically compromised by variations in temperature. As described in Section II-B, these variations affect the refractive index of ring resonators, causing the transmission spectra to shift unpredictably.

We simulated optical links with OptiSystem, an optical communication system simulation software [28], to obtain eye diagrams of the ring resonators. The simulated optical channel consists of a laser source, signal generator that generates 10-Gb/s pseudorandom nonreturn-to-zero (NRZ) code, a modulator to modulate the NRZ code to optical signals, intermediate resonator, and demodulator. The resonance frequency was varied to simulate the effect of temperature variation.

As observed in Fig. 4(a), BER increases with variation in temperature and reaches  $10^{-12}$  at a temperature variation of  $\sim 3.5$  K. This value is sufficient for reliable on-chip communication [29]. Fig. 4(b) shows eye diagrams for different temperature variations. Eye diagrams are used to qualitatively examine signal integrity and signal-to-noise ratio in a communication system. As temperature varies, the quality of eye diagrams deteriorates, indicating reduced signal integrity. For example, the eye diagram is large and wide-open when temperature shift is 0 K, which indicates nearly perfect signal transition with almost zero BER; however, signal jitter causes the eyes to close due to reduced efficiency of signal coupling caused by thermal variation.

To obtain runtime chip temperature, we ran multicoreoriented workloads on a cycle-accurate multiprocessor simulator and the generated power traces are then fed into hotspot [30]. We modified GARNET [31] to simulate the photonic NoC. We used average BER as an indicator to provide a measure of how temperature variations affect the operation of our simulated photonic network. We obtained BER along the optical path by evaluating the temperature of the involved photonic devices. We observed that if temperature



Fig. 4. Impact of temperature shift. (a) BER versus temperature shift. (b) Eye diagrams for various temperature shifts.

variations were left unaddressed, the average BER across the network would be unacceptably high  $(>10^{-1})$  and all messages would be corrupted during transmission, implying the need for a thermally resilient photonic NoC architecture.

# C. Temperature-Detecting Resonators

Temperature information of resonators is necessary for maintaining their initial operating conditions. Integrated temperature sensors like thermistors and resistance temperature detectors are usually used to measure the temperature within a chip. However, these conventional integrated sensors require large areas, making them unsuitable for large-scale photonic networks that contain thousands [12] or even millions [11] of ring resonators.

In Aurora, we employ resonators to measure temperature [32] because of their small area overhead and compatibility with CMOS technology. Temperature detector data are distributed to all nodes through the electronic network at 1 s intervals due to the low rate of ambient temperature change. In these resonators, the amplitude of the output is related to temperature variation. Resonators used for temperature detection are coupled to waveguides through splitters to minimize signal loading. In the implementation, the output signal of a resonator is amplified and converted by a root-mean-square detector into dc current whose level indicates the amount of frequency shift [Fig. 5(c)]. A temperature-detecting resonator along with its detection and control circuitry are deployed in each switch and modulator set. The placement of these resonators within the modulator sets and the switches is shown in Fig. 5(a) and (b).

# IV. AURORA: THERMALLY RESILIENT PHOTONIC NOC ARCHITECTURE

We propose a holistic approach to mitigate the effect of temperature variations on the operation of photonic NoCs. Our techniques target circuit, architecture, and OS levels, respectively. For small temperature variations, we adopt a circuit-level technique [18] that adjusts the bias current flowing through ring resonators to locally compensate for thermal effects. At the architecture level and for larger temperature variations, we reroute messages away from higher temperature regions through cooler regions to their destinations. At the OS level, we employ a thermal/congestion-aware coscheduling



Fig. 5. Placement of temperature-detecting resonators. (a) Modulator/demodulator sets. (b) Switches (detection and control circuits not shown). (c) Temperature-detecting circuit.

technique to further reduce BER. More attractively, our solutions at the circuit, architecture, and OS levels can be further integrated with each other to reduce BER.

#### A. Circuit-Level Technique

We use the circuit-level technique proposed in [18] to combat temperature variations within a small range (e.g., 15 K). The heat generated by the flow of an appropriate dc bias current through a ring resonator is used to maintain the original operating conditions. As the temperature varies, the bias current is varied to compensate for changes in local temperature to maintain the resonant frequency at its original value. Fig. 6 shows the schematic diagram used to control the bias current through a p-i-n resonator-based modulator. Only sectors of the ring and the N region are shown for clarity. A bias tee network combines a modulating signal with the dc bias to modulate the refractive index of the resonator via free-carrier injection and extraction. The inductor and the capacitor provide isolation between the dc bias and the RF bit generator inputs. In [18], the modulation was maintained for a temperature rise of 15 K by changing the base operating condition from 1.36 mA at 0.2 V to 345  $\mu$ A at 2.2 V bias. In nominal operation, reducing the bias current does not have an effect on the modulation process since the highspeed RF signal injects the required amount of carriers to perform switching. The use of this technique is limited to small variations in temperature since the amount of wavelength shift using free-carrier injection and extraction method is limited to  $\sim$ 2 nm. In contrast, the amount of wavelength shift due to temperature variations can be up to 20 nm [33].

# B. Architecture-Level Technique

The circuit-level solution could mitigate the impact of small variations in temperature. However, due to the variance of running workloads, some regions of the chip area may experience temperature variations beyond the compensation range



Fig. 6. Schematic diagram of the bias circuit used for compensating small range temperature variations.

of the circuit-level solution. We propose rerouting messages away from resonators within these regions, and through cool regions to their destinations. We propose two techniques based on the shortest distance algorithm: 1) SPF and 2) temperature first (TF). SPF selects the path with the lowest MER among all shortest paths available. On a tie, the algorithm selects the path with the lowest utilization. TF selects the path with the lowest temperature (i.e., the lowest MER path between source and destination) when the circuit-level technique is unable to compensate. On a tie, the algorithm considers route length and route utilization to mitigate link delay and avoid congestion.

Fig. 7 shows the routes generated by the proposed algorithms under various thermal scenarios. The regions where the dc bias current was able to compensate for the resonance frequency shifts are indicated in white. The regions that are beyond the compensation range of the dc bias current are indicated in orange. Source and destination nodes are indicated in blue. The paths selected by SPF algorithm are indicated by A, and the paths selected by TF algorithm are indicated by B. As shown, our proposed routing algorithms search for a shortest distance path to the destination by avoiding hot regions and hence incur low MER. Messages that fail to find a cool path toward the destination incur a higher MER than messages that succeed. Messages that fail to be delivered are retransmitted after a timeout period.

The routing path is calculated at the source node. To compute a routing path, the source node gathers temperature information of the resonators, which is distributed to all nodes through the electrical network. Before sending a packet, the source node first calculates MER at each resonator along the routing path according to  $mer_i = 1 - (1 - ber_i)^n$ , where  $ber_i$ is the BER of one resonator and n is the number of bits in one message. Then, MER is obtained by multiplying all MERs for each resonator on that path, i.e., mer =  $1 - \prod_{r} (1 - \text{mer}_r)$ , where r is the number of resonators in that path. After that, the source node performs either SPF or TF algorithm utilizing MER as the weight of a path. Then the source node selects the path with the minimum weight among the shortest distance paths (SPF) or the path with the minimal weight among all paths (TF). Aurora employs an electrical/optical hybrid network structure, and path establishment is performed via the



Fig. 7. Paths selected by the proposed routing algorithms under various thermal scenarios. (a)–(c) present three examples of the paths selected by SPF and TF algorithms, respectively.

electrical network, so it is reasonable to assume that no error occurs when establishing the path. Deadlock in the electrical network can be avoided using virtual channel flow control. On the other hand, the photonic network is inherently deadlock-free due to circuit switching and predetermined routing path.

Note that as the number of cores increases, the number of paths available for transmission also increases. However, if the source or destination or both cores are located in hot regions themselves, a high MER is inevitable regardless of the selected path. In these situations, thermal management solutions such as dynamic clock disabling and dynamic frequency scaling can be invoked to halt or poweroff hot cores for a period of time [34] to guarantee reliable communication.

# C. OS-Level Technique

To further mitigate, the effect of temperature variations on the photonic network and reduce MER, we propose a thermal/congestion-aware coscheduling scheme at the OS level. The OS distributes workloads across the multicore substrate to reorganize the temperature profile of the chip. The OS prioritizes the outer cores of the chip rather than the inner cores when mapping workloads to cores. Usually, a set of related workloads occupies adjacent cores and the communications demand within that set is high. We treat related workloads as one set when performing thermal/congestionaware coscheduling. Fig. 8(a) shows a scenario in which this coscheduling technique relocates four workload sets (T1'-T4') to new locations (T1-T4). Workload sets can be rotated when necessary as in the case of T3. If the outer cores are already occupied by other workloads, rescheduling will only be performed when a workload set can be mapped as a block to maintain efficient communication among the set. Fig. 8(b) shows the pseudocode of the coscheduling algorithm.

This thermal/congestion-aware coscheduling algorithm provides three benefits. First, relocating workloads to the edges



Fig. 8. (a) Relocation of workloads by applying coscheduling. (b) Pseudocode for coscheduling algorithm.

of the chip helps reduce both peak and average chip temperatures since the edges of a chip are more efficient in transferring heat to the ambience than the center of the chip. Second, chip utilization and performance are increased. Due to fragmentation, a new workload may be prevented from being allocated to contiguous cores, resulting in increased communication latency. Maintaining the shape of the workload sets and preferentially mapping workloads to the outer cores alleviates the impact of fragmentation. Third, the utilization of links located on the edge of the chip is increased. Traditional traffic-based routing algorithms tend to route messages through the center of the chip, resulting in significant congestion in that area [35]. With coscheduling, workloads at outer cores may take advantage of side links within a chip. However, the average packet traveling distance is increased and the available routing path is decreased after applying this coscheduling technique. Fortunately, as we will show in Section VI, this drawback could be largely compensated by photonic networks due to the inherent high-speed and low-power nature of light. This makes our thermal/congestion-aware coscheduling highly suitable for photonic networks.

All techniques will be applied based on feedback from temperature sensors, as shown in Fig. 9. The circuit-level technology is always ON and is responsible for precisely tuning the resonator. The architectural level and OS level techniques will be turned on if the maximal temperature shift of resonators exceeds the threshold of 15 K. They together provide course grain but larger-scale compensation.

#### V. EXPERIMENTAL SETUP

In this paper, we used Simics and General Execution-driven Multiprocessor Simulator (GEMS) simulation frameworks. Simics [36] provides a full-system functional simulation framework, whereas GEMS [37] provides a cycle-accurate timing simulator, which models timing of multiprocessor memory systems. We used GARNET [31], which is a detailed cycle-accurate on-chip network model incorporated inside GEMS framework, and extended it to support the proposed Aurora architecture. All simulations are performed on the network discussed in Section III. Table I lists the parameters of the simulated chip. We evaluated



Fig. 9. Cooperation of all cross-layer technologies.

# TABLE I Chip Parameters

| Number of cores                       | 30 arranged as 5×6 in a folded torus |
|---------------------------------------|--------------------------------------|
| Convection resistance                 | 0.07 K/W                             |
| Convection capacitance                | 240.4 J/K                            |
| Area of demodulator/photodetector set | 660 μm × 40 μm                       |
| Area of switch                        | 70 μm × 70 μm                        |
| Number of resonators in               | 3840                                 |
| demodulator/photodetector set         |                                      |
| Number of resonators in switches      | 720                                  |
| Number of resonators in temperature-  | 120                                  |
| detecting units                       |                                      |
| Total number of ring resonators       | 4680                                 |

our techniques using a set of representative synthetic traffic patterns (i.e., uniform random, transpose, bit complement and tornado [35]). GARNET generates traffic during a period of one million cycles (including 1 K warm-up cycles). We assume that the E/O and O/E conversions are carried out at 640 Gb/s (64 wavelengths, 10 Gb/s each). Since the time needed to establish an optical path is quite costly, in particular, under heavily loaded situations, the size of messages in photonic networks should be larger than those in traditional electrical networks to increase network performance. Nevertheless, extraordinarily large messages may block the network due to the lack of virtual channels and buffers in the photonic network. Thus, in our simulations, we set the maximum message size to 13312 bits, which is a tradeoff between link efficiency and blocking probability.

We simulated a 30-core processor with private 512 KB cache to generate the temperature profiles. We assume 3-GHz frequency and a 45-nm technology with a supply voltage of 1.2 V. Each core is 4 mm × 4 mm for a total chip area of 20 mm × 24 mm. The baseline processor and memory architecture are listed in Table II. To evaluate our proposed techniques, we modeled all of the above components. To evaluate the efficiency of our proposed schemes under a wide range of temperature profiles, we constructed various thermal scenarios using the method described in Section III-B. Table III lists the characteristics of thermal scenarios used to evaluate our techniques. Fig. 10 shows the thermal map of each generated scenario.

# VI. EVALUATION RESULTS

In this paper, we evaluate the reliability and performance characteristics of the proposed Aurora architecture using different architecture- and OS-level thermal management

TABLE II
BASELINE MACHINE CONFIGURATION

| Parameter    | Configuration                                  |  |
|--------------|------------------------------------------------|--|
| Width        | 4-wide fetch/issue/commit                      |  |
| IQ, ROB, LSQ | 64 Issue Queue, 96 ROB entries, 48 LSQ entries |  |
| TLB          | 128 entries(ITLB), 256 entries(DTLB), 4-way,   |  |
|              | 200 cycle                                      |  |
| Branch Pred. | 2 K entries Gshare, 10-bit global history, 32  |  |
|              | entries RAS                                    |  |
| I/D L1 Cache | 64 KB, 4-way, 64 Byte/line, 2 ports, 3 cycle   |  |
| Integer ALU  | 4 I-ALU, 2 I-MUL/DIV, 2 Load/Store             |  |
| FP ALU       | 2 FP-ALU, 2 FP-MUL/DIV/SQRT                    |  |
| L2 Cache     | Private 512KB, 4-way, 128 Byte/line, 12 cycle  |  |

TABLE III
THERMAL SCENARIOS

|    | Scenario   | Synopsis                                           |  |  |
|----|------------|----------------------------------------------------|--|--|
| S1 | Center     | A block of hot cores in the center force traffic   |  |  |
|    | block      | to use the edges as paths, Fig. 10(a)              |  |  |
| S2 | Corner     | More than half of the hot cores are located at     |  |  |
|    | block      | the corner, Fig. 10(b)                             |  |  |
| S3 | Winding    | Hot regions force traffic to follow a winding      |  |  |
|    | path       | path to destination, Fig. 10(c)                    |  |  |
| S4 | Narrow     | Hot regions on both sides, dividing the            |  |  |
|    | strait     | processor into two sections, Fig. 10(d)            |  |  |
| S5 | Side block | A block of hot cores in the side, force traffic to |  |  |
|    |            | use the center and other side as paths, Fig.       |  |  |
|    |            | 10(e)                                              |  |  |
| S6 | Random 1   | Randomly generated hot regions, Fig. 10(f)         |  |  |
| S7 | Random 2   | Randomly generated hot regions, Fig. 10(g)         |  |  |
| S8 | Random 3   | Randomly generated hot regions, Fig. 10(h)         |  |  |

schemes. Table IV lists the evaluated techniques. We assume that the circuit-level technique is always activated to achieve thermal stability on small range temperature variations.

## A. NoC Latency

Fig. 11 shows average latency of the simulated photonic NoCs under four traffic patterns (uniform random, transpose, bit complement, and tornado) and various thermal management techniques. As described in Section IV, the architecturelevel techniques decrease average BER, but can introduce additional congestion since messages tend to traverse through cool regions. In general, we observed that the average network latency increases by 5%-50% compared with the baseline cases. The network latency of SPF falls in between those of TF and the baseline, since SPF takes both the path length and BER into consideration. Fig. 11 further shows that in most of the cases, network latency can be reduced by combining OS-level technique with architecture-level technique. Compared with SPF and TF cases, the average latency reductions of SPF+OS and TF+OS are 6% and 29%, respectively. This is because our proposed OS-level technique diminishes the high-temperature regions within the chip and hence provides additional routing alternatives.



Fig. 10. Thermal maps of the generated scenarios. (a) Center block. (b) Corner block. (c) Winding path. (d) Marrow strait. (e) Side block. (f) Random 1. (g) Random 2. (h) Random 3.

Note that retransmitting corrupted messages would incur additional latency overhead. In this event, the latency of the baseline case would increase significantly more than our proposed techniques. On average, and across all scenarios, over 72% of the messages need to be retransmitted if only circuit level technology is applied. This is because our proposed architecture- and OS-level techniques dramatically reduce MER (see Section VI-B for details), thus reducing message retransmission probability.

#### B. BER and MER

Fig. 12 shows average BER for our simulated photonic NoC using various thermal management techniques. The first three bars in each group represent BER after applying architecture-level techniques (i.e., Shortest Distance (SD),

| TABLE IV             |  |  |  |
|----------------------|--|--|--|
| EVALUATED TECHNIQUES |  |  |  |

| Scheme | Routing Algorithm   | OS-level Technique |
|--------|---------------------|--------------------|
| SD     | Shortest-distance   | No                 |
| SD+OS  | Shortest-distance   | Yes                |
| SPF    | Shortest-Path First | No                 |
| TF     | Temperature First   | No                 |
| SPF+OS | Shortest-Path First | Yes                |
| TF+OS  | Temperature First   | Yes                |



Fig. 11. NoC Latency (a) Center block. (b) Corner block. (c) Winding path. (d) Marrow strait. (e) Side block. (f) Random 1. (g) Random 2. (h) Random 3.

SPF, and TF). The next three bars show BER after applying both architecture- and OS-level techniques, (i.e., SD+OS, SPF+OS, and TF+OS). As indicated, BER is reduced by 10% and 50% after applying the architecture-level technique (SPF and TF) alone. On average, combining the architecture- and OS-level techniques can further reduce BER by 93% and 92% for SPF+OS and TF+OS, respectively.

We observed that in Fig. 12, the average BER of SD (baseline) case in Scenario 1  $1 \times 10^{-3}$ , whereas it is  $2 \times 10^{-4}$  for Scenario 2. This indicates that average BER depends on the thermal map of the chip. The high BER in Scenario 1 is

attributed to the routes traversing the high-temperature region in the center of the chip. After applying the architecture-level technique to Scenario 1, BER is significantly reduced compared with Scenario 2 since more messages are rerouted through the cooler paths. Furthermore, applying the OS-level technique provides more cool paths through the center than Scenario 2 by relocating high-temperature regions to the outer cores.

Among SD, TF, and SPF cases, TF achieves the best BER performance followed by SPF. This is because TF depends upon the heat distribution in the network, and thus tends to route messages through the regions with least MER, whereas SPF uses temperature information as well as number of hops from source to destination. There is a tradeoff between delay and error rate improvement shown by SPF and TF algorithms. For cases with high congestion, TF shows more improvement in BER 60%–80% at the expense of increasing network delay. The above observations are also valid for TF+OS and SPF+OS cases.

We also recorded average MER, which indicates the ratio of messages that fail in delivery to total messages as shown in Fig. 13. SPF and TF show 6% and 30% improvement compared with the baseline case, whereas SPF+OS and TF+OS can achieve 77% and 85% improvement on average in our simulation scenarios.

# C. Power Consumption

Total power consumption in Aurora is mainly attributed to: 1) heat generated by the dc bias current (direct localized heating) for each ring resonator and 2) energy consumed by the network for the transmission of messages. The static energy of the network is also converted to a per-bit scale and integrated into part (2) as in [38].

Compared with conventional metal strip heaters, maintaining the operating temperature by varying the dc bias current consumes  $\sim\!50\%$  less energy [18] due to direct localized heating. The metal strip heaters are implemented in a metal layer atop the photonic layer. Due to the top cladding oxide between the metal layer and the waveguides, the metal strips cannot directly heat the resonators and thus are power-inefficient. In contrast, the dc bias current provides localized heating in the p-i-n junction surrounding the resonator and thus is more efficient. We assume that the heater's size is 2  $\mu m \times 2~\mu m \times 5~\mu m$  and its surface heat release rate is 1 mW/ $\mu m^3$ . The thickness of the top cladding oxide is assumed to be 1  $\mu m$ .

We also modeled the power consumption for both electrical and photonic networks. For the electrical network, dynamic power consumed due to data transmission is obtained through ORION [39]. The total power consumed on our 2-D  $5 \times 6$  mesh electrical network is calculated as in [38]. For the photonic network, the resonators consume energy when free carriers are injected into the rings. The in-plane poly-Si energy consumed is  $100\,$  fJ/bit [40]. Assuming advanced driver circuits with poly-Si carrier lifetimes of  $0.1-1\,$ ns and modulation speed of  $10\,$  Gb/s, the power consumed by each modulator is  $\sim\!200\,$  fJ/bit [40]. The energy consumption is also related to the link MER since retransmission of messages which fail in delivery will cost additional energy. In addition,



Fig. 12. Average BER of the network.



Fig. 13. Average MER of the network.

power consumption of photo-detectors is related to bit error rate (BER). In this paper, we adopted the expected BER of  $10^{-15}$  [1], [48] to ensure reliable end-to-end communications, which required 5  $\mu$ W sensing power per photo-detector [50]. The power loss of different photonic components is summarized in Table V [49].

We compare the power consumption of a network using conventional metal strip heaters to a network using the dc bias control method, as shown in Fig. 14(a) and (b). The dc bias current driven heater is about twice as power efficient as the conventional metal strip heaters. On average, the dc bias method consumes 32% less total power than the metal strip heater. Moreover, by leveraging the architecture level and OS level coscheduling techniques, Aurora could further save another 4% power (TF+OS scheme) because of decreasing message retransmissions.

TABLE V
OPTICAL LOSS IN VARIOUS COMPONENTS

| Optical coupler          | 1 dB     | Optical splitter         | 0.2 dB                              |
|--------------------------|----------|--------------------------|-------------------------------------|
| Interlayer coupling loss | 1 dB     | Filter through           | 1 <sup>-4</sup> ~1 <sup>-2</sup> dB |
| Filter drop              | 1.5dB    | Photo detector           | 0.1 dB                              |
| Waveguide loss           | 0.3dB/cm | Bending loss             | 0.5 dB                              |
| Non-linear loss          | 1 dB     | Modulator Insertion loss | $0 \sim 1 dB$                       |
| Waveguide crossing       | 0.05dB   |                          |                                     |

#### VII. RELATED WORK

Recent advances in silicon-based photonic devices have inspired the possibility of realizing photonic NoCs that satisfy the communications requirement of future multicore processors. Vantrease *et al.* [11] proposed a bus-



Notations of bars in each group (left to right): Heater, SD, SPF, TF

■ Heater / DC-Bias Power

(a)



Notations of bars in each group (left to right): Heater, SD+OS, SPF+OS, TF+OS

Heater / DC-Bias Power

Network Power

(b)

Fig. 14. Comparison of network power consumption. (a) After applying architectural-level techniques. (b) After applying architectural- and OS-level techniques. The Heater scheme is a similar hybrid network using conventional metal strip heaters to compensate for temperature variations in resonators.

based optical crossbar topology for a 256-core processor. Pan *et al.* [10] proposed a hierarchical bus-based optical crossbar topology. The crossbar is partitioned into multiple crossbars in order to localize arbitration. Similarly, Cianchetti *et al.* [9] have also proposed optical crossbar architecture with predecoded source routing. A different approach is taken in [12], where a 2-D folded torus topology utilizes a circuit-switched network as discussed in Section III. A common factor among the above topologies is that they rely on high-speed modulators, switches, multiplexers, demultiplexers, or detectors, which utilize ring resonators as building blocks. The thermal reliability of ring resonators is left unaddressed.

Temperature-aware routing was proposed for electrical NoCs using ThermalHerd [41], a distributed run time thermal management scheme. ThermalHerd tries to mitigate the hotspots generated at a router by reducing its workload using different routing schemes.

High-speed modulators have been demonstrated using two different structures. Mach–Zehnder interferometers [42] and resonant structures [26]. Among those, Mach–Zehnder interferometers have better thermal stability [16]. However, they are bulky and consume greater power. Ring resonators are smaller in size but highly sensitive to temperature variations [16]. The effect of temperature variations on the operation of ring resonators has been mitigated by the use of metal heaters

placed on top of waveguides. However, this method consumes significant power and is cumbersome to implement for large number of ring resonators. Improvements on metal heaters have been proposed in [18] using localized heating due to p-i-n junction de bias current. Nawrocka et al. [17] have also observed the undesirable shifting of resonance frequency due to temperature variations. Biberaman et al. [43] used integrated metal heaters to tune individual rings and align their resonant modes to the signals' wavelengths to mitigate the thermal impact on ring resonators. However, for a chip with thousands of ring resonators, it is difficult to monitor and control each ring separately. Another approach taken in [44] and [45] is to induce pressure in the resonator using oxidation methods to compensate for the effect of temperature variations on the resonance frequency shift. Guha et al. [46] proposed passive temperature compensation by coupling a ring resonator to a Mach-Zehnder interferometer. A laser source can be tuned to follow the resonant frequency shift of the rings to compensate for temperature effects. Tunable lasers are commonly used in communications and testing; Wang et al. [47] used a tunable laser with steps of 0.01 nm over the wavelength range of 1520-1580 nm to characterize microring resonators. Li et al. [51] use CMOS drivers that independently control the rising- and falling-edge preemphasis levels in order to compensate for this nonlinear transient behavior. Chen et al. [52] use a geometric design to reduce such effects by chirping the refractive index of successive paired turns in the microcoil resonator. Although all the above techniques have been demonstrated for a single or small number of resonators, they are difficult to implement in largescale photonic NoCs because of implementation difficulty or the unwieldy number of terminals to control the heaters. Until a proper solution at the device level is found, architecture and OS-level solutions must provide reliable operation under wide thermal variations. To the best of our knowledge, there has been no prior work on designing thermal resistant photonic NoCs at architecture and system levels.

#### VIII. CONCLUSION

Photonic NoCs provide many benefits over their electrical counterpart such as low latency, high-bandwidth density, and repeater-less long range transmission. Recent advances in silicon photonics and integration with microelectronics present an opportunity to exploit the benefits of photonic NoCs in future multicore processors. To implement photonic NoCs, high-speed modulators, switches, multiplexers, demultiplexers, and detectors must be integrated into the chip. Recent work has demonstrated that ring resonators can be used as versatile devices to build photonic NoCs. However, architecting photonic NoCs presents new challenges. In particular, thermal effects are a major concern. For example, temperature variations cause a shift in the resonance frequency prompting the resonator to respond to a different frequency. By responding to a different frequency, unintended operations occur or data is corrupted, thereby introducing high BER or even causing faulty operation in a photonic NoC. Thermal effects deteriorate the reliability and performance of photonic NoCs. Therefore, the successful implementation of photonic NoCs hinges on the

ability to overcome thermal challenges. Current device-level methods to counter the effect of temperature variations are difficult to implement in CMOS processes or not suitable for large scale on-chip networks.

In this paper, we address the effect of temperature variations on photonic NoCs and propose cross-layer solutions targeted at the circuit, architecture, and OS levels to mitigate their effect and improve reliability. At the circuit layer, we maintain the original temperature by varying the bias current through ring resonators. At the architecture layer, we reroute messages away from hot regions and through cooler regions to their destinations. At the OS layer, we employ thermal/congestionaware coscheduling to relocate workloads to the outer cores where heat dissipation is more efficient. This encourages messages to use the cooler center of the processor. The solutions can be integrated with each other to further reduce BER and improve reliability. We have shown that the average BER and MER are reduced by 95% (96%) and 77% (85%) for SPF+OS (and TF+OS) techniques, respectively. We have also shown that power consumption of Aurora can be reduced by up to 37% by applying the three techniques.

#### REFERENCES

- [1] G. Chen, H. Chen, M. Haurylau, N. A. Nelson, D. H. Albonesi, P. M. Fauchet, *et al.*, "Predictions of CMOS compatible on-chip optical interconnect," *VLSI J.*, vol. 40, no. 4, pp. 434–446, Jul. 2007.
- [2] J. Owens, W. Dally, R. Ho, D. N. Jayasimha, S. Keckler, and L. Peh, "Research challenges for on-chip interconnection networks," *IEEE Micro*, vol. 27, no. 5, pp. 96–108, Sep./Oct. 2007.
- [3] N. Kirman, M. Kirman, R. K. Dokania, J. Martínez, A. B. Apsel, M. A. Watkins, et al., "Leveraging optical technology in future busbased chip multiprocessors," in Proc. 39th Annu. IEEE/ACM Int. Symp. Microarchit., Dec. 2006, pp. 492–503.
- [4] D. Vantrease, N. Binkert, R. Schreiber, and M. Lipasti, "Light speed arbitration and flow control for nanophotonic interconnects," in *Proc. 42nd Annu. IEEE/ACM Int. Symp. Microarchit.*, Dec. 2009, pp. 304–315.
- [5] G. Chen, H. Chen, M. Haurylau, N. A. Nelson, D. H. Albonesi, P. M. Fauchet, et al., "On-chip optical interconnects: Challenges and critical directions," *IEEE J. Sel. Topics Quantum Electron.*, vol. 12, no. 6, pp. 1699–1705, Mar. 2007.
- [6] N. Kirman and J. F. Martínez, "An efficient all-optical on-chip interconnect based on oblivious routing," in *Proc. 15th ASPLOS*, Oct. 2010, pp. 15–28.
- [7] A. Joshi, C. Batten, Y. Kwon, S. Beamer, I. Shamim, K. Asanovic, et al., "Silicon-photonic clos networks for global on-chip communication," in Proc. 3rd ACM/IEEE Int. Symp. NOCS, Apr. 2009, pp. 124–133.
- [8] V. Stojanovic, A. Joshi, C. Batten, Y. Kwon, and K. Asanovic, "Many-core processor networks with monolithic integrated CMOS photonics," in *Proc. 29th CLEO/QELS*, Jun. 2009, pp. 1–2.
- [9] M. J. Cianchetti, J. C. Kerekes, and D. H. Albonesi, "Phastlane: A rapid transit optical routing network," in *Proc. 36th Annu. ISCA*, Jun. 2009, pp. 441–450.
- [10] Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary, "Firefly: Illuminating future network-on-chip with nanophotonics," in *Proc. 36th Annu. ISCA*, Jun. 2009, pp. 429–440.
- [11] D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. P. Jouppi, M. Fiorentino, et al., "Corona: System implications of emerging nanophotonic technology," in Proc. 35th Annu. ISCA, Jun. 2008, pp. 153–164.
- [12] A. Shacham, K. Bergman, and L. Carloni, "Photonic networks-on-chip for future generations of chip multiprocessors," *IEEE Trans. Comput.*, vol. 57, no. 9, pp. 1246–1260, Sep. 2008.
- [13] D. Miller, "Rationale and challenges for optical interconnects to electronic chips," *Proc. IEEE*, vol. 88, no. 6, pp. 728–749, Jun. 2000.
- [14] Q. Xu, S. Manipatruni, B. Schmidt, J. Shakya, and M. Lipson, "12.5 Gbit/s carrier-injection-based silicon micro-ring silicon modulators," *Opt. Exp.*, vol. 15, no. 2, pp. 430–436, Jan. 2007.

- [15] T. Yin, R. Cohen, M. M. Morse, G. Sarid, Y. Chetrit, D. Rubin, et al., "40Gb/s Ge-on-SOI waveguide photodetectors by selective Ge growth," in Proc. OFC/NFOEC, Feb. 2008, pp. 1–3.
- [16] M. Lipson, "Guiding, modulating, and emitting light on siliconchallenges and opportunities," *J. Lightw. Technol.*, vol. 23, no. 12, pp. 4222–4238, Dec. 2005.
- [17] M. S. Nawrocka, T. Liu, X. Wang, and R. R. Panepucci, "Tunable silicon microring resonator with wide free spectral range," *Appl. Phys. Lett.*, vol. 89, no. 7, pp. 071110-1–071110-3, Aug. 2006.
- [18] S. Manipatruni, R. K. Dokania, B. Schmidt, N. Sherwood-Droz, C. B. Poitras, A. B. Apsel, *et al.*, "Wide temperature range operation of micrometerscale silicon electro-optic modulator," *Opt. Lett.*, vol. 33, no. 19, pp. 2185–2187, Oct. 2008.
- [19] K. Lahiri, A. Raghunathan, and S. Dey, "System-level performance analysis for designing on-chip communication architectures," *IEEE Trans. Comput.*, Aided Des. Integr. Circuits Syst., vol. 20, no. 6, pp. 768–783, Jun. 2001.
- [20] R. Amatya, C. W. Holzwarth, F. Gan, H. I. Smith, F. Kärtner, R. J. Ram, et al., "Low power thermal tuning of second-order microring resonators," in *Proc. CLEO/QELS*, May 2007, pp. 1–2.
- [21] P. Alipour, E. S. Hosseini, A. Eftekhar, B. Momeni, and A. Adibi, "Temperature-insensitive silicon microdisk resonators using polymeric cladding layers," in *Proc. CLEO/QELS*, Jun. 2009, pp. 1–2.
- [22] M. Han and A. Wang, "Temperature compensation of optical microresonators using a surface layer with negative thermooptic coefficient," *Opt. Lett.*, vol. 32, no. 13, pp. 1800–1802, Jul. 2007.
- [23] Semiconductor Industries Association. Washington, DC, USA. (2006). International Technology Roadmap for Semiconductors [Online]. Available: http://www.itrs.net/
- [24] B. G. Lee, X. Chen, A. Biberman, X. Liu, I. Hsieh, C. Chou, et al., "Ultrahigh-bandwidth silicon photonic nanowire waveguides for on-chip networks," *IEEE Photon. Technol. Lett.*, vol. 20, no. 6, pp. 398–400, Mar. 15, 2008.
- [25] R. A. Soref and B. R. Bennett, "Electrooptical effects in silicon," *IEEE J. Quantum Electron.*, vol. 23, no. 1, pp. 123–129, Jan. 1987.
- [26] Q. Xu, B. Schmidt, S. Pradhan, and M. Lipson, "Micrometre-scale silicon electro-optic modulator," *Nature*, vol. 435, pp. 325–327, May 2005.
- [27] M. Lipson, "Compact electro-optic modulators on a silicon chip," *IEEE J. Sel. Topics Quantum Electron.*, vol. 12, no. 6, pp. 1520–1526, Nov./Dec. 2006.
- [28] Optisystem Simulator, Optiwave, Ottawa, Canada, 2013.
- [29] L. Zheng, A. Mickelson, L. Shang, M. Vachharajani, D. Filipovic, W. Park, et al., "Spectrum: A hybrid nanophotonic—Electric onchip network," in *Proc. DAC*, Jul. 2009, pp. 575–580.
- [30] K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan, "Temperature-aware microarchitecture: Modeling and implementation," ACM Trans. Archit. Code Optim., vol. 1, no. 1, pp. 94–125, Mar. 2004.
- [31] N. Agarwal, T. Krishna, L. Peh, and N. K. Jha, "GARNET: A detailed on-chip network model inside a full-system simulator," in *Proc. IEEE ISPASS*, Apr. 2009, pp. 33–42.
- [32] M. A. Hopcroft, B. Kim, S. Chandorkar, R. Melamud, M. Agarwal, C. M. Jha, et al., "Using the temperature dependence of resonator quality factor as a thermometer," Appl. Phys. Lett., vol. 91, no. 1, pp. 013505-1–013505-3, Jul. 2007.
- [33] N. Sherwood-Droz, H. Wang, L. Chen, B. G. Lee, A. Biberman, K. Bergman, et al., "Optical 4×4 hitless silicon router for optical networks-on-chip (NoC)," Opt. Exp., vol. 16, no. 20, pp. 15915–15922, Sep. 2008.
- [34] G. M. Link, and N. Vijaykrishnan, "Hotspot prevention through runtime reconfiguration in network-on-chip," in *Proc. Conf. DATE*, vol. 1. 2006, pp. 648–649, 2005.
- [35] W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks. San Mateo, CA, USA: Morgan Kaufmann, 2004.
- [36] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, et al., "Simics: A full system simulation platform," Computer, vol. 35, no. 2, pp. 50–58, 2002.
- [37] M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, et al., "Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset," ACM SIGARCH Comput. Archit., vol. 33, no. 4, pp. 92–99, 2005.
- [38] A. Shacham, K. Bergman, and L. P. Carloni, "The case for low-power photonic networks-on-chip," in *Proc. DAC*, 2007, pp. 132–135.

- [39] A. B. Kahng, B. Li, L. S. Peh, and K. Samadi, "ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration," in *Proc. Conf. Des., Autom. Test Eur.*, 2009, pp. 423–428.
- [40] C. Batten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C. Holzwarth, et al., "Building manycore processor-to-dram networks with monolithic silicon photonics," in *Proc. Hot Interconnects*, Aug. 2008, pp. 21–30.
- [41] L. Shang, L.-S. Peh, A. Kumar, and N. K. Jha, "Thermal modeling, characterization and management of on-chip networks," in *Proc. 37th Annu. IEEE/ACM Int. Symp.*, Dec. 2004, pp. 67–78.
- [42] W. Green, M. Rooks, L. Sekaric, and Y. Vlasov, "Ultra-compact, low RF power, 10 Gb/s silicon Mach-Zehnder modulator," *Opt. Exp.*, vol. 15, no. 25, pp. 17106–17113, 2007.
- [43] A. Biberman, N. Sherwood-Droz, B. G. Lee, M. Lipson, and K. Bergman, "Thermally active 4×4 non-blocking switch for networkson-chip," in *Proc. 21st Annu. Meeting IEEE LEOS*, Nov. 2008, pp. 370–371.
- [44] P. Cheben, D. Xu, S. Janz, and A. Delage, "Scaling down photonic waveguide devices on the SOI platform," *Proc. SPIE*, vol. 5117, pp. 147–156, Jan. 2003.
- [45] S. M. Weiss, M. Molinari, and P. M. Fauchet, "Temperature stability for silicon-based photonic band-gap structures," *Appl. Phys. Lett.*, vol. 83, no. 10, pp. 1980–1982, 2003.
- [46] B. Guha, B. Kyotoku, and M. Lipson, "CMOS-compatible athermal silicon microring resonators," *Opt. Exp.*, vol. 18, no. 4, pp. 3487–3493, 2010.
- [47] M. Wang, H. Ng, D. Li, X. Wang, J. Martinez, R. Panepucci, et al., "Wavelength reconfigurable photonic switching using thermally tuned micro-ring resonators fabricated on silicon substrate," Proc. SPIE, vol. 6645, p. 66450I, Sep. 2007.
- [48] V. R. Almeida, C. A. Barrios, R. R. Panepucci, and M. Lipson, "All-optical control of light on a silicon chip," *Nature*, vol. 431, pp. 1081–1084, Aug. 2004.
- [49] N. Kirman, M. Kirman, R. K. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, et al., "Leveraging optical technology in future bus-based chip multiprocessors," in *Proc. IEEE/ACM Int. Symp. Microarchitecture*, Dec. 2006, pp. 492–503.
- [50] A. Melloni, M. Martinelli, G. Cusmai, and R. Siano, "Experimental evaluation of ring resonator filters impact on the bit error rate in nonreturn to zero transmission systems," *Opt. Commun.*, vol. 234, nos. 1–6, pp. 211–216, 2004.
- [51] C. Li, R. Bai, A. Shafik, E. Z. Tabasy, G. Tang, C. Ma, et al., "A ring-resonator-based silicon photonics transceiver with bias-based wavelength stabilization and adaptive-power-sensitivity receiver," in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2013, pp. 124–125.
- [52] G. Y. Chen, T. Lee, X. L. Zhang, G. Brambilla, and T. P. Newson, "Temperature compensation techniques for resonantly enhanced sensors and devices based on optical microcoil resonators," *Opt. Commun.*, vol. 285, no. 23, pp. 4677–4683, 2012.



**Zhongqi Li** received the B.S. and M.S. degrees from the University of Electronic Science and Technology of China, Chengdu, China, in 2006 and 2009, respectively, and the Ph.D. degree from the University of Florida, Gainesville, FL, USA, in 2012.

He is currently with Qualcomm, Inc., San Diego, CA, USA, as an Adreno GPU Performance Architect for the Snapdragon mobile processor. He was with Marvell Semiconductor, Chandler, AZ, USA, as a Processor Performance Engineer (intern) for Marvell's next-generation ARM processor. His current

research interests include CPU/GPU architecture, network-on-chip, and multicore processor systems.



computing.

Amer Qouneh received the B.S. degree in electrical engineering from Fairleigh Dickinson University, Hackensack, NJ, USA, in 1985, and the M.S. degree from the University of Florida, Gainesville, FL, USA, in 2010, where he is currently pursuing the Ph.D. degree, all in computer engineering.

He was with the Royal Scientific Society, Amman, Jordan, from 1993 to 2000. His current research interests include energy efficiency and power management in data centers, servers and processors, power aware scheduling, and high performance



**Madhura Joshi** received the B.S. degree in electronics from the University of Mumbai, Mumbai, India, in 2007, and the M.S. degree in computer engineering from the University of Florida, Gainesville, FL, USA, in 2010.

She was a Research Assistant with the IDEAL Laboratory, University of Florida, under the supervision of Dr. T. Li. She is currently a Software Engineer with Infinera, Sunnyvale, CA, USA, where she is involved in new generation optical networking systems. Her current research interests include

computer architecture and new memory technologies.



Xin Fu received the B.S. degree in computer science from The Central South University, Changsha, China, and the Ph.D. degree in computer engineering from the University of Florida, Gainesville, FL, USA, in 2003 and 2009, respectively.

She is currently an Assistant Professor with the Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA. She was a Post-Doctoral Scholar with the Computer Science Department, University of Illinois at Urbana-Champaign, Urbana, IL, USA, from 2009

to 2010. Her current research interests include computer architecture, highperformance computing, hardware reliability and variability, mobile computing, and energy-efficient computing.

Dr. Fu was a recipient of the 2014 National Science Foundation (NSF) Faculty Early CAREER Award, the 2012 Kansas NSF EPSCoR First Award, and the 2009 Computing Innovation Fellow.



Wangyuan Zhang received the bachelor's degree in electrical engineering and automation from Beihang University, Beijing, China, in 2005, and the master's degree in computer engineering and the Ph.D. degree in computer architecture from the University of Florida, Gainesville, FL, USA, in 2010 and 2007, respectively.

He is currently a Software Engineer with Google, Mountain View, CA, USA. His current research interests include memory system design using emerging memory technologies, 3-D microarchitec-

ture, computer architecture, distributed system, and storage systems.



**Tao Li** received the Ph.D. degree in computer engineering from the University of Texas at Austin, Austin, TX, USA.

He is an Associate Professor with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA. His current research interests include computer architecture, microprocessor/memory/storage system design, virtualization technologies, energy-efficient/ sustainable/dependable data center, cloud/big data computing platforms, the impacts of emerging

technologies/applications on computing, and evaluation of computer systems.

Dr. Li received the 2009 National Science Foundation Faculty Early CAREER Award, the 2008, 2007, 2006 IBM Faculty Awards, the 2008 Microsoft Research Safe and Scalable Multicore Computing Award, and the 2006 Microsoft Research Trustworthy Computing Curriculum Award. His paper received the Best Paper Award from HPCA in 2011 and three of his papers were nominated for the Best Paper Awards in DSN in 2011, MICRO in 2008, and MASCOTS in 2006.